Spark: Spark Execution Model

The Spark execution model can be defined in three phases:

creating the logical plan
translating that into a physical plan
executing the tasks on a cluster

You can view useful information about your Spark jobs in real time in a web browser with this URL: http://:4040. For Spark applications that have finished, you can use the Spark history server to see this information in a web browser at this URL: http://:18080. Let’s walk through the three phases and the Spark UI information about the phases, with some example code.

The Logical Plan

In the first phase, the logical plan is created. This is the plan that shows which steps will be executed when an action gets applied. Recall that when you apply a transformation on a Dataset, a new Dataset is created. When this happens, that new Dataset points back to the parent, resulting in a lineage or directed acyclic graph (DAG) for how Spark will execute these transformations.

The Physical Plan

Actions trigger the translation of the logical DAG into a physical execution plan. The Spark Catalyst query optimizer creates the physical execution plan for DataFrames, as shown in the diagram below:

(Image reference: Databricks)

The physical plan identifies resources, such as memory partitions and compute tasks, that will execute the plan.

https://mapr.com/blog/how-spark-runs-your-applications/

Spark

Spark Execution Model

No comments:

Post a Comment